Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthrough정책 문서 검색에 Cohere 기반 리랭킹을 도입하고, 회원 프로필 컨텍스트 빌더에 추가 필드와 포맷 헬퍼를 적용했습니다. 관련 DTO, 클라이언트, 설정 및 검색 흐름이 추가/수정되었습니다. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Client
participant Retriever as PolicyDocumentRetriever
participant VectorStore as Policy Vector Store
participant CohereClient as Cohere Rerank Client
participant CohereAPI as Cohere API
Client->>Retriever: retrievePolicy(query, member)
Retriever->>VectorStore: similaritySearch(query, topK=candidatePoolTopK)
VectorStore-->>Retriever: candidates (larger set)
Retriever->>CohereClient: rerank(query, candidates, policyTopK)
CohereClient->>CohereAPI: POST /v2/rerank (query, documents, topN)
CohereAPI-->>CohereClient: CohereRerankResponse (results with indices)
CohereClient->>CohereClient: map results -> Document list by index
CohereClient-->>Retriever: reranked documents
Retriever-->>Client: final topK results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/main/java/org/sopt/kareer/global/external/cohere/service/CohereRerankClient.java (1)
24-30: RestClient 인스턴스 재사용 권장
restClient()메서드가 호출될 때마다 새로운RestClient인스턴스를 생성합니다.RestClient는 스레드 안전하므로 빈 초기화 시 한 번만 생성하여 재사용하는 것이 효율적입니다. 또한 커넥션/읽기 타임아웃 설정 추가도 권장합니다.♻️ RestClient 싱글톤 및 타임아웃 적용
+import org.springframework.http.client.SimpleClientHttpRequestFactory; +import jakarta.annotation.PostConstruct; +import java.time.Duration; + `@Slf4j` `@Component` `@RequiredArgsConstructor` public class CohereRerankClient { private final CohereProperties cohereProperties; + private RestClient restClient; - private RestClient restClient() { - return RestClient.builder() + `@PostConstruct` + private void init() { + SimpleClientHttpRequestFactory factory = new SimpleClientHttpRequestFactory(); + factory.setConnectTimeout(Duration.ofSeconds(5)); + factory.setReadTimeout(Duration.ofSeconds(10)); + + this.restClient = RestClient.builder() .baseUrl(cohereProperties.baseUrl()) .defaultHeader(HttpHeaders.AUTHORIZATION, "Bearer " + cohereProperties.apiKey()) .defaultHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE) + .requestFactory(factory) .build(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/org/sopt/kareer/global/external/cohere/service/CohereRerankClient.java` around lines 24 - 30, The restClient() method currently builds a new RestClient on every call; modify CohereRerankClient to create a single reusable RestClient instance (e.g., a private final field initialized once in the constructor or a `@Bean`) instead of recreating it per call, and configure connection/read timeouts on that RestClient; ensure the construction uses cohereProperties.baseUrl() and cohereProperties.apiKey() (same values currently used) and replace usages of restClient() to use the stored instance.src/main/resources/application.yml (1)
111-116: Cohere API 호출에 대한 타임아웃 설정 고려Cohere API 호출 시 타임아웃이 설정되어 있지 않습니다. 외부 API 호출이 지연될 경우 요청 스레드가 장시간 블로킹될 수 있습니다.
read-timeout-ms와connect-timeout-ms설정 추가를 권장합니다.♻️ 타임아웃 설정 추가 예시
cohere: api-key: ${COHERE_API_KEY} base-url: https://api.cohere.com + connect-timeout-ms: 5000 + read-timeout-ms: 10000 rerank: model: rerank-multilingual-v3.0 top-n: 10🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/resources/application.yml` around lines 111 - 116, Add explicit HTTP timeouts to the Cohere configuration by introducing properties for connect and read timeouts (e.g., cohere.connect-timeout-ms and cohere.read-timeout-ms) alongside the existing base-url and api-key entries; update the config block that contains keys like cohere.api-key, cohere.base-url, and cohere.rerank to include these timeout values with sensible defaults (for example 5000ms connect, 30000ms read) so the client code that reads cohere.* settings can use them to avoid blocking on slow external calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@src/main/java/org/sopt/kareer/global/external/ai/builder/context/MemberContextBuilder.java`:
- Line 48: The MemberContextBuilder currently appends the "preparationStatus"
key twice via appendLine(sb, "preparationStatus",
member.getPreparationStatus()); remove the duplicate call so "preparationStatus"
is only appended once (keep the single correct appendLine usage), and ensure any
surrounding logic in the build method still preserves the intended order of
context entries.
In
`@src/main/java/org/sopt/kareer/global/external/ai/service/PolicyDocumentRetriever.java`:
- Around line 27-34: The fallback behavior returns the full candidates list
(size candidatePoolTopK) when CohereRerankClient.rerank(...) fails, but callers
expect props.policyTopK() items; update the catch/fallback path so that when
rerank throws or returns null you trim or select the top props.policyTopK()
entries from the existing candidates list (from
policyDocumentVectorStore.similaritySearch) before returning, referencing the
variables/methods CohereRerankClient.rerank, candidates, and props.policyTopK()
to locate and implement the change.
In
`@src/main/java/org/sopt/kareer/global/external/cohere/service/CohereRerankClient.java`:
- Around line 58-61: The current fallback in CohereRerankClient returns the
original documents when response is empty which can mismatch the expected topN
size; update the fallback to return a list whose size matches the requested topN
(or candidatePoolTopK if topN not passed) by trimming documents if longer or
padding/duplicating with the best-available items if shorter, and ensure this
logic is used in the method that processes response and returns 'documents'
(refer to variables/methods: response, documents, CohereRerankClient,
PolicyDocumentRetriever, candidatePoolTopK, topN) so callers always receive a
consistently-sized result.
- Around line 63-67: The loop in CohereRerankClient that builds 'reranked' uses
result.index() from response.results() without bounds checks, which can throw
IndexOutOfBoundsException if the Cohere response is invalid; update the code in
CohereRerankClient (where response.results() is iterated and
documents.get(result.index()) is called) to validate that result.index() is
within 0..documents.size()-1 before accessing documents, and on invalid indexes
either skip the entry and log a warning including the offending index and the
documents size or throw a controlled custom exception so the caller can handle
it.
---
Nitpick comments:
In
`@src/main/java/org/sopt/kareer/global/external/cohere/service/CohereRerankClient.java`:
- Around line 24-30: The restClient() method currently builds a new RestClient
on every call; modify CohereRerankClient to create a single reusable RestClient
instance (e.g., a private final field initialized once in the constructor or a
`@Bean`) instead of recreating it per call, and configure connection/read timeouts
on that RestClient; ensure the construction uses cohereProperties.baseUrl() and
cohereProperties.apiKey() (same values currently used) and replace usages of
restClient() to use the stored instance.
In `@src/main/resources/application.yml`:
- Around line 111-116: Add explicit HTTP timeouts to the Cohere configuration by
introducing properties for connect and read timeouts (e.g.,
cohere.connect-timeout-ms and cohere.read-timeout-ms) alongside the existing
base-url and api-key entries; update the config block that contains keys like
cohere.api-key, cohere.base-url, and cohere.rerank to include these timeout
values with sensible defaults (for example 5000ms connect, 30000ms read) so the
client code that reads cohere.* settings can use them to avoid blocking on slow
external calls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: bccdff63-ac89-4f46-96e7-9ac56934de0a
📒 Files selected for processing (7)
src/main/java/org/sopt/kareer/global/external/ai/builder/context/MemberContextBuilder.javasrc/main/java/org/sopt/kareer/global/external/ai/service/PolicyDocumentRetriever.javasrc/main/java/org/sopt/kareer/global/external/cohere/dto/request/CohereRerankRequest.javasrc/main/java/org/sopt/kareer/global/external/cohere/dto/response/CohereRerankResponse.javasrc/main/java/org/sopt/kareer/global/external/cohere/properties/CohereProperties.javasrc/main/java/org/sopt/kareer/global/external/cohere/service/CohereRerankClient.javasrc/main/resources/application.yml
| if (response == null || response.results() == null || response.results().isEmpty()) { | ||
| log.warn("Cohere rerank response empty. query={}", query); | ||
| return documents; | ||
| } |
There was a problem hiding this comment.
응답이 비어있을 때의 fallback 동작 확인 필요
응답이 null이거나 비어있을 때 원본 documents(candidatePoolTopK 크기)를 그대로 반환합니다. 이 경우에도 호출부에서 기대하는 topN 크기와 불일치가 발생합니다. PolicyDocumentRetriever에서 언급한 것처럼 일관된 크기 처리가 필요합니다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@src/main/java/org/sopt/kareer/global/external/cohere/service/CohereRerankClient.java`
around lines 58 - 61, The current fallback in CohereRerankClient returns the
original documents when response is empty which can mismatch the expected topN
size; update the fallback to return a list whose size matches the requested topN
(or candidatePoolTopK if topN not passed) by trimming documents if longer or
padding/duplicating with the best-available items if shorter, and ensure this
logic is used in the method that processes response and returns 'documents'
(refer to variables/methods: response, documents, CohereRerankClient,
PolicyDocumentRetriever, candidatePoolTopK, topN) so callers always receive a
consistently-sized result.
| List<Document> reranked = new ArrayList<>(); | ||
| for (CohereRerankResponse.Result result : response.results()) { | ||
| reranked.add(documents.get(result.index())); | ||
| } | ||
| return reranked; |
There was a problem hiding this comment.
Cohere 응답의 index 값 유효성 검증 필요
Cohere API 응답의 result.index()가 입력 documents 리스트의 범위를 벗어날 경우 IndexOutOfBoundsException이 발생할 수 있습니다. 외부 API 응답이므로 방어적 검증을 추가하는 것이 안전합니다.
🛡️ 인덱스 유효성 검증 추가
List<Document> reranked = new ArrayList<>();
for (CohereRerankResponse.Result result : response.results()) {
+ if (result.index() < 0 || result.index() >= documents.size()) {
+ log.warn("Cohere returned invalid index: {}. Skipping.", result.index());
+ continue;
+ }
reranked.add(documents.get(result.index()));
}
return reranked;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@src/main/java/org/sopt/kareer/global/external/cohere/service/CohereRerankClient.java`
around lines 63 - 67, The loop in CohereRerankClient that builds 'reranked' uses
result.index() from response.results() without bounds checks, which can throw
IndexOutOfBoundsException if the Cohere response is invalid; update the code in
CohereRerankClient (where response.results() is iterated and
documents.get(result.index()) is called) to validate that result.index() is
within 0..documents.size()-1 before accessing documents, and on invalid indexes
either skip the entry and log a warning including the offending index and the
documents size or throw a controlled custom exception so the caller can handle
it.
Related issue 🛠
Work Description 📝
Summary by CodeRabbit
릴리스 노트